Global optimization of nonlinear latent variable models in community ecology

Canadian Mathematical Society, Winter 2014, Hamilton Ontario

Steve Walker, Department of Mathematics and Statistics, McMaster University

A small example -- Les poissons du Doubs

Drawing

Verneaux (1973)

A small example -- Les poissons du Doubs

    Environmental variables         

distance to the source
altitude
slope
minimum average debit
pH
total hardness of water
phosphates
nitrates
ammonia nitrogen
dissolved oxygen
biological demand for oxygen

Drawing Drawing

Verneaux (1973)

Correlations among species abundances ...

plot of chunk unnamed-chunk-1

... not fully explained by environment

plot of chunk unnamed-chunk-2

Eg: Wood-decaying fungi

Drawing

Ovaskainen et al (2010)

Eg: Wood-decaying fungi

Model

  1. Fungal species \(j\) found on spruce log \(i\) when \(z_{ij} > 0\)
  2. \(z_{ij} = \mu_{ij} + \mathrm{logit}(\Phi(e_{ij}))\)
  3. (Environment) \(\mathbf{\mu} = \mathbf{X}\mathbf{\beta}\)
  4. (Correlated residuals) \(\mathbf{e}_i \sim \mathcal{N}(\mathbf{0}, \mathbf{R})\)

Where:

  • \(\Phi\) is the cummulative density function of the normal distribution
  • \(\mathbf{X}\) is a matrix of environmental variables
  • \(\mathbf{R}\) is a correlation matrix

Estimated correlation matrix, \(\mathbf{R}\)

Drawing

Ovaskainen et al (2010)

Two sources of these correlations

1. Species interactions

  • Competition
  • Predation
  • Mutualism
  • Apparent competition

2. Unmeasured environmental variables

  • Even something as simple as temperature is difficult to fully characterize, due to heterogeneity
  • Latent variables can be used to model correlations induced by unmeasured environmental variables

Using latent variables to model correlations

plot of chunk unnamed-chunk-3

Using latent variables to model correlations

plot of chunk unnamed-chunk-4

Using latent variables to model correlations

plot of chunk unnamed-chunk-5

Using latent variables to model correlations

plot of chunk unnamed-chunk-6

Using latent variables to model correlations

plot of chunk unnamed-chunk-7

Using latent variables to model correlations

plot of chunk unnamed-chunk-8

Many tools available for fitting (generalized) linear latent variable models

1. PCA

2. Probabilistic PCA

3. Generalized linear PCA

4. Factor analysis

5. Structural equation modelling

6. Some item response models

7. BayesComm

But we know the relationship between environmental variables and species abundances are sometimes nonlinear

Drawing

e.g. Whittaker (1952)

Nonlinear latent variable models

plot of chunk unnamed-chunk-9

Most approaches to fitting nonlinear latent variable models suffer from one of two drawbacks

1. Algorithmic complexity effectively prohibits finding the global optimum

  • e.g. Gauch et al. (1974); Yee (2004); Walker and Jackson (2011); Harris (preprint)

2. Necessary algorithmic heuristics obscure what model is actually being fitted

  • e.g. ter Braak (1985); Minchin (1987)

My objective: Give an algorithm for global optimization of simple nonlinear latent variable models

The model

\[ \underbrace{y_{ij}}_{\text{abundance of species } j \text{ at site } i} \sim \mathcal{N}(\mu_{ij}, \sigma^2) \]

\[ \mu_{ij} = a_j + b_jx_i + c_jx_i^2 \]

1. \(x_i \sim \mathcal{N}(0, 1)\) is the value of the latent variable at site \(i\)

2. \(a_j\), \(b_j\), and \(c_j\) are parameters describing species \(j\)

3. We constrain \(c_j < 0\) to obtain 'hump-shape' curves

The optimization problem

Minimize: \[ \mathcal{L}(x, a, b, c) = \sum_i\sum_j\left(y_{ij}- a_j - b_jx_i - c_jx_i^2\right)^2 + \sum_ix_i^2 \]

Over all \(x_i\), \(a_j\), \(b_j\), and \(c_j\)

Subject to \(c_j < 0\)

This is a difficult problem -- an example slice

plot of chunk unnamed-chunk-10

1. Local minima (not convex)

2. If initial parameters are bad, you could easily get stuck

3. As the species parameters are updated, the local minima become

even more difficult to escape

4. Result: combinatorial optimization

Simulated data

plot of chunk unnamed-chunk-11

Estimated vrs true latent variables (L-BFGS-B)

plot of chunk unnamed-chunk-12

Estimated latent variables from random starting locations with L-BFGS-B

plot of chunk unnamed-chunk-13

plot of chunk unnamed-chunk-14

The new idea: Don't fit the true model!

Fit one that's more complex!

\[ \underbrace{y_{ij}}_{\text{abundance of species } j \text{ at site } i} \sim \mathcal{N}(\mu_{ij}, \sigma^2) \]

\[ \mu_{ij} = a_j + b_jx_i + c_jz_i \]

1. \(x_i \sim \mathcal{N}(0, 1)\) is the value of the latent variable at site \(i\)

2. \(z_i | x_i \sim \mathcal{N}(x_i^2, \lambda^{-1})\), for regularization parameter \(\lambda\)

3. \(a_j\), \(b_j\), and \(c_j\) are parameters describing species \(j\)

4. We constrain \(c_j < 0\) to obtain 'hump-shape' curves

The key to the new idea:

  1. More complex --> two latent variables instead of one
  2. But a prior encourages a nonlinear relationship between the latent variables
  3. When the prior variance goes to \(\infty\) (\(\lambda = 0\)), the regular linear (i.e. two-axis PCA-like) model is obtained
  4. When the prior variance is \(0\) (\(\lambda \to \infty\)), the original nonlinear model is obtained
  5. Therefore, we can fit the PCA-like model first (easy!), and then tune \(\lambda\) up until we reach the desired level of nonlinearity

The new optimization problem

Minimize: \[ \mathcal{L}(x, z, a, b, c) = \sum_i\sum_j\left(y_{ij}-a_j - b_jx_i - c_jz_i\right)^2 \]

\[ + \sum_ix_i^2 + \lambda\sum_i\left(z_i-x_i^2\right)^2\]

Over all \(x_i\), \(z_i\), \(a_j\), \(b_j\), and \(c_j\)

Subject to \(c_j < 0\)

Fit to the simulations (again with L-BFGS-B)

plot of chunk unnamed-chunk-15

Simulated data

plot of chunk unnamed-chunk-16

Fit to the simulations again

plot of chunk unnamed-chunk-17

Free lunch?

  1. No ... the price is having to optimize the added parameter vector.
  2. Need to optimize over a bigger space (so a little slower convergence times).
  3. But this bigger space is easier to explore because it doesn't contain the local holes that you can fall into.

Next steps

  1. Write usable code.
  2. More extensive simulation experiments.
  3. Find out why it works!
  4. Find and prove conditions for reaching a global optimum
    • convexity properties won't help here because the basic model is bilinear, not linear
    • e.g. SVD solves a non-convex optimization problem with a fast globally convergent algorithms
  5. Consider more mechanistic, complex, and realistic models

Why does it work?

Drawing